Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: Reduce compile time by trimming template expansion in IBA #4476

Merged

Conversation

lgritz
Copy link
Collaborator

@lgritz lgritz commented Oct 4, 2024

I was profiling the builds and saw that modules with lots of template expansion dominate the compile time. For example,
imagebufalgo_pixelmath.cpp alone took 290s to compile on my 2020 MacbookPro (!), and imagebufalgo_addsub.cpp took 84 seconds.

This is all due to the combnatorics of expanding IBA templates via the DISPATCH macros in imagebufalgo_util.h separately for every type that the arguments can be. But I claim that most combinations are rarely if ever used. I mean, how often does anybody need IBA::add() to add an int8 image to an int16 image? So this PR rewrites those macros to simplify the cases as follows:

  • The common pixel data types are float, half, unint8, and uint16.

  • Specialized versions are fully expanded only when the result and input images are one of these types. Images not of one of those types are first automatically converted to float to make them reduce to a common case. That makes uncommon pixel data types like (signed) int16 not expand the template, but rather convert to and from float and use the float specializations.

  • For binary and ternary operations (those with 2 or 3 image inputs), if the pixel types of the inputs doesn't match, we make sure they both are converted to float. So, for example, we don't need a specialized version that adds a half image to a uint16 image -- just convert them to float and use the common case. But we do specalize if the two inputs are both the same common case, such as adding two uint16 images.

  • Assume that commonly, the result image will either be float, or will be the same pixel data type as the inputs. Other combinations trigger assignment to a temporary float IB, then copying with convertion to the uncommon use-supplied result buffer.

  • Additionally, we cut down on a little bit more templating by moving some "deep" methods from the type-templated ImageBuf::Iterator to its type-generic non-templated base class IteraterBase.

The net result of all this is an awful lot less template expansion. With this in place, my laptop compiles imagebufalgo_pixelmath.cpp in 97s (vs 290 before) and imagebufalgo_addsub.cpp in 26s (from 84). It takes a big bite out of all the iba files, and reduces project-wide compile time by over 10%, around 30s out of 300 for a fresh, uncached, optimized build with 16 threads.

I was profiling the builds and saw that modules with lots of template
expansion dominate the compile time. For example,
imagebufalgo_pixelmath.cpp alone took 290s to compile on my 2020
MacbookPro (!), and imagebufalgo_addsub.cpp took 84 seconds.

This is all due to the combnatorics of expanding IBA templates via the
DISPATCH macros in imagebufalgo_util.h separately for every type that
the arguments can be. But I claim that most combinations are rarely if
ever used. I mean, how often does anybody need IBA::add() to add an
int8 image to an int16 image? So this PR rewrites those macros to
simplify the cases as follows:

* The common pixel data types are float, half, unint8, and uint16.

* Specialized versions are fully expanded only when the result and
  input images are one of these types. Images not of one of those
  types are first automatically converted to float to make them reduce
  to a common case. That makes uncommon pixel data types like (signed)
  int16 not expand the template, but rather convert to and from float
  and use the float specializations.

* For binary and ternary operations (those with 2 or 3 image inputs),
  if the pixel types of the inputs doesn't match, we make sure they
  both are converted to float. So, for example, we don't need a
  specialized version that adds a half image to a uint16 image -- just
  convert them to float and use the common case. But we do specalize
  if the two inputs are both the same common case, such as adding two
  uint16 images.

* Assume that commonly, the result image will either be float, or will
  be the same pixel data type as the inputs. Other combinations trigger
  assignment to a temporary float IB, then copying with convertion to
  the uncommon use-supplied result buffer.

* Additionally, we gut down on a little bit more templating by moving
  some "deep" methods from the type-templated ImageBuf::Iterator to its
  type-generic non-templated base class IteraterBase.

The net result of all this is an awful lot less template expansion.
With this in place, my laptop compiles imagebufalgo_pixelmath.cpp
in 97s (vs 290 before) and imagebufalgo_addsub.cpp in 26s (from 84).
It takes a big bite out of all the iba files, and reduces project-wide
compile time by over 10%, around 30s out of 300 for a fresh, uncached,
optimized build with 16 threads.

Signed-off-by: Larry Gritz <lg@larrygritz.com>
@lgritz lgritz changed the base branch from main to dev-3.0 October 7, 2024 17:17
@lgritz lgritz merged commit ff20241 into AcademySoftwareFoundation:dev-3.0 Oct 8, 2024
28 of 29 checks passed
@lgritz lgritz deleted the lg-dispatch-restrict branch October 8, 2024 15:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant